SPACE 2

home *** CD-ROM | disk | FTP | other *** search

/ SPACE 2 / SPACE - Library 2 - Volume 1.iso / apps / 284 / applic / txtool.doc < prev next >

Wrap

Text File | 1988-08-18 | 10KB | 304 lines

************************************************************* TxTool, a Utility for Word Processing (c) 1988 by Don E. Farmer 16810 Deer Creek Dr. Spring, TX 77379 This is SHAREWARE and may not be sold by anyone. PLEASE DONATE $5.00 FOR THIS PROGRAM. ************************************************************** TxTool is a GEM application for the Atari ST. It is a utility program that was designed to be used as an adjunct to a word processor. Using it will increase your word processing power. TxTool computes word counts, checks spelling, reports questionable usage of English, and does search-and-replace's that are specified by a file. Except for word counts, TxTool requires the use of auxiliary files, which are termed "dictionaries." These, you build with your word processor. How powerful TxTool is for you depends largely on the effort you put into tailoring these dictionaries to meet your needs. (I have included some of the dictionaries I use in this arc. Please note that the spelling dictionary is only a start. You may wish to purchase one from Austin Code Works, rather than making your own.) When this GEM application is opened, the menu bar displays "Desk", "File", "Options", and "Help." Under "Help" are the selections "General", "Counts", "Spell", "Usage", and "StrSub." Selecting these leads to dialog boxes that serve to remind you how to operate TxTool. They cannot take the place of the information provided here. In its general operation, TxTool reads an input text file, which should be ASCII for reliable results, along with the appropriate dictionary, and then writes its results to an output file. The option, "Counts" requires no dictionary. In all cases the input text and the dictionary are not altered when TxTool is run. An option to mouse usage is also provided for. By typing the key combination "Control I", for example, the File Selector is displayed for the path name of the input stream. The key combinations are displayed along with what they select in the menu, the character "^" designating "Control." Using the key strokes is faster but not easier than using the mouse. The counts that TxTool does are of words, sentences, words per sentence, and word frequency, this being an alphabetized list of every word used in the input text along with the number of its occurrences. The algorithm to do this comes from a slight modification of one given in Kernighan & Ritchie's, The C Programming Language, as does the binary search function used in the spelling checker. A "word" is a string, no longer than 32 characters, of ASCII letters; the punctuation marks, apostrophe and hyphen, are also included. A sentence is of at most 512 characters and is a string of words terminated by a period, a question, or an exclamation mark. This punctuation is necessary, for TxTool processes sentences, not lines, and will not work without it. The number of words and sentences is often useful. The distribution of words per sentence can indicate how much variety in sentence length there is. And, with the word frequency list, the overuse of a particular word is easily spotted. Also, this list can be loaded into a word processor and edited to add to a spelling dictionary. The spelling dictionary is an ASCII file containing a list of alphabetized words, one word of at most 32 characters for each line in the file. The words should have neither leading nor trailing spaces nor anything embedded in it that is not a letter, an apostrophe, or a hyphen. This dictionary is alphabetized in ASCII order. If you are building a "SPELL.DIC" and are using a line sort that allows "dictionary order", do not use it! Use the standard ASCII sort instead. The spelling checker is not sensitive to case, and apostrophes are significant, being considered to be letter and a part of the word that they are in. The checker does a binary search of the dictionary for each word in the input text, writing the words it does not find to the output file. You do not have to listen to a chorus of dings nor tire your trigger finger clicking repeatedly on "OK." You can load the results of the spelling checker into your word processor, edit it, and then let "StrSub" make the corrections for you. The checker's search is efficient, particularly so since no data compaction or pointer hashing is done. (Isn't cheap memory wonderful?) To determine the number of words the spelling dictionary can hold requires you to know how much free ram remains when the program is executing. One fourth of this ram is allotted to pointers to the entries while the other three fourth's hold the actual text. Suppose, for example, it is known that 400K of ram was free when TxTool was resident and had freed the heap, that is, the memory available for dynamic allocation. Then 100K would be taken up by pointers, and since a pointer requires four bytes, this would mean that the spelling dictionary could hold at most 25K words. (Although there are about 500K words in the English language, it is said that the average person uses less than 10K. I'm sure the owner of an ST uses more!) You can approximate the heap by summing the size of TXTOOL.PRG, TXTOOL.RSC, 32K (TxTool takes this for its stack.), and your desk accessories, and then subtracting this from your memory size. Doing this, I must emphasize, is only an approximation, for we are dependent on the gemdos allocator Malloc(), which has been know to have its quirks. A usage dictionary will help you to avoid mistakes in idiom such as using "off of" for "off" and "over with" for "over." It will help you to avoid trite and redundant expressions such as "neither rhyme nor reason" and "a smile on his face." (Where but a face would a smile be?) There can be 7000 lines in a usage dictionary, each entry taking two lines, and no line being longer than 64 characters. The first line might be "a smile on his face" and the second, the report "REDUNDANT." For each sentence in the input text, TxTool searches the target lines in the usage dictionary for a match. When a match is made, the line following the target is included in the report that is written to the output file. The target line allows the character '?' to serve as a wild card character in the searches. Thus, "h??" could be either "him" or "her." I have found it convenient to have a number of usage dictionaries, "IDIOM.DIC", "TRITE.DIC", "WORDY.DIC", for example, and name my output files "*.IDM", "*.TRI", and "*.WOR." This is just a suggestion, however, as TxTool allows you to name your files anything you wish. Two warnings are in order. First, the dictionary search is necessarily linear so if you have 3500 trite expressions, and there might well be that many, then you might want to go for a nice long walk because searching each sentence of your input 3500 times will take a while. Second, and I suppose there is some humor in this, working with trite expressions is like being around the plague: you're apt to catch it! You find yourself using ones you had never heard of until you starting searching for them in Fowler's Modern English Usage. The ones new to you sound pretty good; that is why, of course, they got worn out so readily. The dictionary for string substitution, "StrSub", also has two lines for each entry, each no longer than 64 characters. The first line, again, is a target string; but the second is the replacement text. When the target is matched in the input file, the replacement text is substituted for it. Again, the input text is searched linearly, and only once in each sentence is a target string replaced. For example, let's say you wanted to replace your "cats" with "dogs" in your text. Your dictionary entry would read: cats dogs and if your input text was: Cats cats cats. Your output would be: Cats dogs cats. The first "Cats" is not replaced because of case sensitivity and the latter because of only one replacement per sentence. In practice this is not much of a restriction as you can "bucket brigade" your files for multiple passes. You can use StrSub as a gender changer for him's to her's, he's to she's, etc, if you are writing reports concerning specific male and females "persons." Most of the constraints mentioned so far are manifest constants in the C source code and can be changed should you compiled it again. I used MegaMax's Lazer C, but see no reason why it could not be compiled with another C. Making other changes to the source code is not recommended unless you are an experienced C programmer. THE C SOURCE CODE IS AVAILABLE FROM ME FOR $15.00. Getting good dictionaries together is where your efforts will be rewarded. Writing is hard work. Trying to come up with the right words at the right time is chore enough without worrying about your "off of"'s and "acid test"'s. With TxTool to assist you, this editing can be done in advance and need be done only once. Then you can allow the muse to flow freely. But do watch out for those "aching voids" and "blushing brides!"